Mitigating Adversarial Norm Training with Moral Axioms

نویسندگان

چکیده

This paper addresses the issue of adversarial attacks on ethical AI systems. We investigate using moral axioms and rules deontic logic in a norm learning framework to mitigate training. model intuition construction provides systems with guard rails yet still allows for conventions. evaluate our approach by drawing inspiration from study commonly used development research. questionnaire aims test an agent's ability reason conclusions despite opposed testimony. Our findings suggest that can correctly situations learn conventions training environment. conclude adding axiomatic prohibitions inference makes it less vulnerable attacks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mitigating Unwanted Biases with Adversarial Learning

Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the net...

متن کامل

Towards Mitigating Audio Adversarial Perturbations

Audio adversarial examples targeting automatic speech recognition systems have recently been made possible in different tasks, such as speech-to-text translation and speech classification. Here we aim to explore the robustness of these audio adversarial examples generated via two attack strategies by applying different signal processing methods to recover the original audio sequence. In additio...

متن کامل

Mitigating adversarial effects through randomization

Convolutional neural networks have demonstrated their powerful ability on various tasks in recent years. However, they are extremely vulnerable to adversarial examples. I.e., clean images, with imperceptible perturbations added, can easily cause convolutional neural networks to fail. In this paper, we propose to utilize randomization to mitigate adversarial effects. Specifically, we use two ran...

متن کامل

Axioms for the Norm Residue Isomorphism

We give an axiomatic framework for proving that the norm residue map is an isomorphism (i.e., for settling the motivic Bloch-Kato conjecture). This framework is a part of the Voevodsky-Rost program.

متن کامل

Adversarial Source Identification Game with Corrupted Training

We study a variant of the source identification game with training data in which part of the training data is corrupted by an attacker. In the addressed scenario, the defender aims at deciding whether a test sequence has been drawn according to a discrete memoryless source X ∼ PX , whose statistics are known to him through the observation of a training sequence generated by X . In order to unde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i10.26402